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Abstract 

We have employed a recent implementation of genetic algorithms to study a range of stan- 
dard benchmark functions for global optimization. It turns out that some of them arc not 
very useful as challenging test functions, since they neither allow for a discrimination be- 
tween different variants of genetic operators nor exhibit a dimensionality scaling resembling 
that of real- world problems, for example that of global structure optimization of atomic 
and molecular clusters. The latter properties seem to be simulated better by two other 
types of benchmark functions. One type is designed to be deceptive, exemplified here by 
Lunacek's function. The other type offers additional advantages of markedly increased 
complexity and of broad tunability in search space characteristics. For the latter type, we 
use an implementation based on randomly distributed Gaussians. We advocate the use of 
the latter types of test functions for algorithm development and benchmarking. 



keywords: global optimization, genetic algorithms, benchmark functions, dimensionahty 
scaling, crossover operators 



1 Introduction 



Global optimization has a lot of real-world applications, both of discrete and non-discrete 
nature. Among them are chemical applications such as structure optimization of molecules 
and clusters, engineering problems such as component design, logistics problems like schedul- 
ing and routing, and many others. Despite the typical practical finding that a general 
global optimization algorithm usually is much less efficient than specific versions tuned to 
the problem at hand, it is still of interest to gauge the baseline performance of a global 
optimization scheme using benchmark problems. Even in the most recent examples of such 
tests 1 1-6) (selected at random from the recent literature), it is customary to employ cer- 
tain standard benchmark functions, with the implicit (but untested) assumption that the 
difficulty of these benchmark functions roughly matches that of real-world applications. 
Some of these benchmark functions even are advertised as particularly challenging. 
We have developed evolutionary algorithm (EA) based global optimization strategies in 
the challenging, real-life area of atomic and molecular cluster structure optimization [7 12 



When we apply our algorithms to those traditional, abstract benchmark functions, however, 
neither of those two claims (challenge, and similarity to real- world applications) stands up. 
In fact, similar suspicions have been voiced earlier. For example, already in 1996 Whitley et 
al. [13] argued that many of the standard benchmark functions should be relatively easily 
solvable due to inherent characteristics like symmetry and separability. Some functions 
even appeared to get easier as the dimensionality of the function increases. Nevertheless, 
as the citations in the previous paragraph indicate, the same set of traditional benchmark 
functions continues to be used indiscriminately to the present day, by the majority of 
researchers in various fields. Therefore, with the present article, we address the need 
to re-emphasize those earlier findings from a practical point of view, add in other test 
functions, and extend the testing to higher dimensionality. In addition, we stress the 
conclusions that these traditional benchmark problems appear to be too simple to allow 
for meaningful comparisons between algorithms or implementation details, and that they 
do not allow conclusions about the performance of global optimization algorithms in real- 
life situations. We show that the latter is achieved better when using different kinds of 
benchmark functions. 

In these contexts, theoretical considerations often focus on classifying a problem as or 
NP 14 , 15 , or on evaluation of marginally different parameter representations 13 , 16 or hy- 



brid combinations of known test functions [13] . Additionally, translations and rotations as 
well as randomized noise are typically added to the normal benchmark functions j^. Albeit 
being a potentially viable approach to complicate the benchmark, it constrains the compa- 
rability of different benchmark results by introducing additional parameters and thereby 
incompatibilities. As this is of central importance when discussion independently devel- 
oped, novel algorithms, this discussion will not contain benchmark functions modified in 
this manner. Quite independent of such problem classifications and algorithm characteris- 
tics, however, in most real- world applications scaling with problem dimension (i.e., number 
of parameters to be optimized) plays a pivotal role. Chemical structure optimization of 
clusters is an obvious example: Of central practical importance are phenomena like cluster 
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aggregation and fragmentation, or the dependence of properties on cluster size, while iso- 
lating a single cluster size is a formidable experimental challenge. Therefore, one does not 
study a single cluster size but tries to systematically study a range of clusters g|9)|l3jl9|, 
only limited by the maximum computing capacity one has. It is obvious that smallest 
decreases in scaling (e.g. from (9(A^'^'^) to 0{N^)) may allow for significantly larger cluster 
sizes to be studied. 

The problem dimensionality scaling of the number of global optimization steps needed is 
of course linked to features of the global optimization algorithm. For evolutionary algo- 
rithms, this includes crossover and mutation operators, possible local optimization steps 
and problem-specifically tuned additional operators. We would like to present here latest 
results of some standard benchmark functions in the context of our recently developed 
framework for the evolutionary global optimization of chemical problems, ogolem [TT]. 
By screening the needed amount of global minimizing steps for solving up to 500 (in one 
case 10000) dimensional benchmark functions, we obtain the scaling of different crossover 
operators with the dimensionality of these functions. Additionally, we compare runs with 
and without local optimization steps in some cases to investigate the effect of gradient 
based minimization on the scaling. Last but not least, we compare the performance on 
these standard benchmark functions with that on different kinds of benchmark function 
that apparently present more serious challenges, coming closer to real-world problems in 
some respects. 

The present work contributes to defining a new baseline and standard for benchmarking 
global optimization algorithms. By demonstrating how a modern implementation of genetic 
algorithms scales in solving the reviewed benchmark functions, we want to encourage other 
developers of global optimization techniques to report not only results for a particular 
dimensionality of a defined benchmark function but focus on the scaling behaviour and 
compare their results to our empirical baseline. 



2 Methods and Techniques 



All calculations mentioned in this paper were carried out using our OGOLEM framework 11 
written in Java with SMP parallelism enabled. Since differing concurrency conditions can 
obviously have an impact on the benchmarking results, all calculations were carried out 
with 24 concurrent threads. 

OGOLEM is using a genetic algorithm (GA), loosely based on the standard GA proposed 



in Ref. 20 but differing in the treatment of the genetic population. Instead of a classical 



generation based global optimization scheme, a pool algorithm 10 is used. This has the 
advantage of both eliminating serial bottlenecks and reducing the number of tunables since, 
e.g., elitism is build-in and no rate needs to be explicitly specified. 

Tunables remaining with this approach are mentioned in table [T] with values kept constant 
in the benchmark runs. 

The genetic operators are based upon a real number representation of the parameters. 
Within the crossover operator, the cutting is genotype-based. Different crossover operators 
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Tunable 
(pool approach) 



Representation 
(generation approach) 



value 



pool size generation size 

global optimization steps generation size times 

number of generations 
fitness diversity threshhold which individuals 

are considered to be same 



1000 

till minimum is reached 
1 • 10-8 



Table 1: Tunables in the pool algorithm and their value during the benchmark. 



used below differ only in the numbers and positions of cuts through the genome. The 
positions are defined by randomized number(s) either being linearly distributed or Gaussian 
distributee^ with the maximum of the Gaussian being in the middle of the genome and 
with the resulting Gaussian-distributed random numbers multiplied with 0.3 to make the 
distribution sharper. In Tab. [2] the used algorithms are summarized and explained. 



Algorithm 


Crossing 


Number of Crossings 


Crossing Point 


Holland 


no 





N/A 


Germany 


yes 


1 


Gaussian 


Portugal: 1 


yes 


1 


linear 


Portugal: 3 


yes 


3 


linear 


Portugal: 5 


yes 


5 


linear 


Portugal:? 


yes 


7 


linear 



Table 2: Definition of the used algorithms. 

Mutation and mating are the same for all algorithms and tests. The mutation is a standard 
one-point genotype mutation with a probability of 5%. The actual gene to be mutated is 
chosen with a linearly distributed random number and replaced with a random number in 
between allowed borders specific to every function/application/parameter. 
Mating is accomplished by choosing two parents from the genetic pool. The mother is cho- 
sen purely randomly, whilst the father is chosen based on a fitness criterion. All structures 
in the pool are ranked by their fitness, a Gaussian distributed random number shaped with 
the factor 0.1 is chosen with its absolute value mapped to the rank in the pool. 
If a local optimization step is carried out, it is a standard L-BFGS, as described in Ref. 
(2T|[22] and implemented in RISO ^3J, with very tight convergence criteria (e.g. 1 ■ 10"^ 
in fitness). The needed gradients are analytical in all cases. 

"'^We are using in both cases the standard PRNG provided by the Java Virtual Machine (JVM) and 
defined by the Java standard. 
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Once the crossing, mutation and (if applicable) local optimization steps have been carried 
out on both children, only the fitter one will be returned to the pool. This fitter child will 
actually be added to the pool if it has a lower function value than the individual with the 
highest function value in the pool and does not violate the fitness diversity criterion. The 
fitness diversity is a measure to avoid premature convergence of the pool. Additionally, 
it promotes exploration by maintaining a minimum level of structural diversity, indirectly 
controlled via fitness values. For example, by assuming that two individuals with the same 
fitness (within a threshold) are same, it eliminates duplicates. Since the pool size stays 
constant, the worst individual will be dropped automatically, keeping the pool dynamically 
updated. 

For the benchmarking, we do not measure timings since they are of course dependent upon 
convergence criteria of the local optimization and potentially even on the exact algorithm 
used for the local optimization (e.g. L-BFGS vs. BFGS vs. CG). Therefore, our benchmark- 
ing procedure measures the number of global optimization steps where a step is defined to 
consist of mating, crossing, mutation and (if applicable) local optimization. The second 
difficulty is to define when the global optimum is found. We are using the function value 
as a criterion, trying to minimize the amount of bias one introduces by any measure. It 
should be explicitly noted here that within the benchmarking of a given test function, this 
value stays constant in all dimensionalities which should in principle increase the difficulty 
with higher dimensionalities, again minimizing the amount of (positive) bias introduced. 
For all local tests with local optimization enabled, five independent runs have been carried 
out and a standard deviation is given. 

3 Standard benchmark functions 

Any approach towards global optimization should be validated with a set of published 

benchmark functions and/or problems. In the area of benchmark functions a broad range 
of published test functions exists, designed to stress different parts of a global optimization 
algorithm. Among the most popular ones are Schwefel's, Rastrigin's, Ackley's, Schaffer's 
F7 and Schaffer's F6 functions. They have the strength of an analytical expression with a 
known global minimum and, in the case of all but the last function, they are extendable 
to arbitrary dimensionality allowing for scaling investigations. Contrary to assumptions 
made frequently, however, most of these benchmark functions do not allow to discriminate 
between algorithmic variations in the global optimization, nor do they give a true impres- 
sion of the difficulty to be expected in real-life applications, as we will demonstrate in the 
following subsections. 
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3.1 Ackley's Function 

Ackley's function has been first published in Ref. [24] and has been extended to arbitrary 



dimensionahty in Ref. 25 It is of the form 



-20 ■ exp 
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with the global minimum at Xi = 0.0. We considered this to be a relatively trivial function 
due to its shape consisting of a single funnel (see fig. [T]). Nevertheless, this function type 
potentially has relevance for real-world applications since e.g. the free energy hypersurface 
of proteins is considered to be of similar, yet less symmetric, shape. 




(a) Full search space (b) Fine structure 

Figure 1: 2D plot of Ackleys function. 

The initial randomized points were drawn from the interval 

- 32.768 <Xi< 32.768 (2) 

for all Xi, which is to our knowledge the normal benchmark procedure for this function. 
As can be seen from Fig. [2} without local optimization steps the choice of genotype operator 
makes almost no difference; all cases exhibit excellent linear scaling. The only deviating 
case is the mutation-only algorithm, Holland, which has a higher prefactor in comparison 
to the other algorithms but still exhibits the same (linear) scaling. With local optimization 
enabled, the results do vary more. Results up to 500 dimensions do not allow for a concise 
statement on the superiority of a specific crossover operator. We therefore extended the 
benchmarking range up to one thousand dimensions, hoping for a clearer picture. It should 
be noted here that on a standard contemporary 24-core compute node, these calculations 
took 3.5 minutes in average (openJDK7 on an openSUSE 11.4), demonstrating the good 
performance of our framework. 

Even with the extended benchmarking range, no concise picture could be obtained. This 
most likely means that none of the used algorithms is clearly superior to the others, in 
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Figure 2: Scaling results for Ackley's function, a) without, b) with local optimization. 



agreement with the results gained from the runs without local optimization. The only 
difference remaining is that the prefactor of the mutation-only algorithm is reduced to 
equality with the crossover algorithms. In general, the cases with enabled local optimization 
do increase the scaling slightly from 1.0 up to 1.28 and 1.51 for Germany and Holland, 
respectively. Nevertheless, even this increased scaling is still excellent and the usage of 
local optimization steps proves to significantly lower the time to solution. 
Still, these results obviously allow for the conclusion that Ackley's benchmark function 
should be considered to be of trivial difficulty since linear scaling is achievable already 
without local optimization. Any state-of-the-art global optimization algorithm should be 
capable of solving it with a scaling close to linear. 

Also, we want to demonstrate an interesting aspect of Ackley's benchmark function when 
manipulating the analytical gradient expression in a distinct manner. In general the mod- 
ification of gradients in order to simplify the problem is not unheard of in applications of 
global optimization techniques to chemical problems (see e.g. Ref. [26j) and we therefore 
consider it to be of interest. The analytical gradient for a gradient element i is defined as 

exp (— 0.2A/a) 27r sin (27rxj) exp (6) 
g[%)=^-Xi ^ \ (3) 

with 



and 



The simple modification that 



and 



a = (4) 
n 

_ Er=i cos (27rxi) 

a = ^ (6) 
n 

^ ^ cos {2TXXi) 
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does not only drastically simplify the gradient expression, but also further helps to simplify 
the global optimization problem. As can be seen in Fig. |3| this simplification effectively 
decouples the dimensions, thereby smoothing the search space. 
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(a) Exact gradient 



(b) Simplified gradient 



Figure 3: Contour plot of gradient element Xi for a 2D-Ackley's function. 

With this simplification in place, Ackley's function can be solved by simply locally optimiz- 
ing a couple of randomized individuals (the first step of our global optimization algorithms) 
up to 10000 dimensions with constant effort. 
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Figure 4: Scaling behaviour of Ackley's function with a modified gradient expression. 



3.2 Rastrigin's Function 



Rastrigin's function 27 28 does have fewer minima within the defined search space of 

- 5.12 < Xi < 5.12 (8) 

but its overall shape is flatter than Ackley's function which should complicate the general 
convergence towards the global optimum at Xi = 0.0. 
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We defined Rastrigin's function witli an additional liarmonic potential outside tlie searcli 
space to force the solution to stay witliin those boundaries when using unrestricted local 
optimization steps. 

n > 5.12 Va;^ < -5.12 : ■ xj 

f{xo...Xn) = lQ-n + Y,{ (9) 

i=o [-5.12 < < 5.12 : - 10 • cos(27ra;i) 




Figure 5: 2D plot of Rastrigin's function 

As can be seen from Fig. [6} the scaling is excellent with all tested crossing operators; Hol- 
land again being the easily rationalizable exception, in the case without local optimization. 
In the case with local optimization, a contrasting picture can be seen. 
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Figure 6: Scaling results for Rastrigin's function, a) without, b) with local optimization 
and c) zoom with local optimization. 

The scaling once more does not deviate much between the different algorithms with the 
prefactor making the major difference. Interestingly, by far the best prefactor and scaling 
is obtained with Holland. Probably, the rather non-intrusive behaviour of a mutation-only 
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operator fits this problem best since it provides a better short-range exploration than any 
of the crossing algorithms. We assume the rather high number of global optimization steps 
(also in comparison to the non-locopt case) to be due to a repeated finding of the same, non- 



optimal minima. Inclusion of taboo-search features 29 into the algorithm might be of help 
for real-world problems of such a type, not reducing the number of global optimization 
steps but the amount of time spent in local optimizations rediscovering already known 
minima. 

Just as Ackley's function discussed earlier, it is possible to conclude that Rastrigin's func- 
tion should be solvable with almost linear scaling using contemporary algorithms. 

3.3 Schwefel's Function 



In comparision to Rastrigin's function, Schwefel's function 30 adds the difficulty of being 



less symmetric and having the global minimum at the edge of the search space 

- 500.0 <Xi< 500.0 (10) 

at position Xj = 420.9687. Additionally, there is no overall, guiding slope towards the 
global minimum like in Ackley's, or less extreme, in Rastrigin's function. 
Again, we added an harmonic potential around the search space 

{X, > 500 V Xi< -500 : +0.02 ■ x^ 
(11) 
—500 < Xi < 500 : —Xi ■ sin ( ^/\xi\ ) 

since otherwise our unrestricted local optimization finds lower-lying minima outside the 
principal borders. 




Figure 7: 2D plot of Schwefels function 
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As can be seen from Fig. [8} sub-quadratic scaling can be achieved with and without local 
optimization. Once more, without local optimization the non-crossing algorithm has a 
higher prefactor but the same scaling as the others, which is equalized when turning on 
local optimizations. In this particular case, the usage of local optimization steps has the 
potential to slightly affect the scaling from 1.15 to 1.75. 
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Figure 8: Scaling results for Schwefel's function, a) without, b) with local optimization. 

Again, we must come to the conclusion that a sub-quadratic scaling is far better than 
what we would expect to obtain for real-world problems, restricting the usage of Schwefel's 
function as a test-case for algorithms designed to solve the latter. 



3.4 SchafFer's F7 function 

Schaffer's F7 function is defined as 

f{Xo...Xn) 



n 



Si ■ (sin (50.04) + 1 



1 2 



with 



in n-dimensional space within the boundaries 



100.0 < X,- < 100.0 



(12) 



(13) 



(14) 



As can be seen from Fig. |9j concentric barriers need to be overcome in order to reach the 
global minimum Xi = 0.0. Judging from the results depicted in Fig. 10, this function is 
capable of discriminating between different algorithmic implementations. For the Gaussian- 
based single point crossover {germany) the scaling is of quartic nature, whilst all the other 
algorithms are scaling linearly with the problem size. The sublinear scaling is an artifact 
of the high ratio of crossover cuts (up to seven) to problem size (only 40D) and it can be 
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(a) Full search space (b) Fine structure 

Figure 9: Plot of S chaffer's F7 function 

expected to increase to linear for higher dimensionalities. Interestingly, the performance 
of the mutation-only algorithm again is on par with the algorithms employing crossover 
operators. We see such a bias with all the benchmark functions so far, and this does not 
conform to our experience with real-life problems. Therefore, conclusions on the importance 
of mutation for global optimization may be too positive when based on an analysis using 
only these functions. 
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Figure 10: Scaling results for Schaffer's F7 function with local optimization. 



Nevertheless, we consider Schaffer's F7 function the most interesting of the benchmark 
functions studied so far. It should be explicitly noted though that a forty-dimensional 
problem size will definitely be too small when exclusively analyzing algorithms employing 
multiple crossover cuts. 
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3.5 Schaffer's F6 Function 

For completeness, we would like to present some non-scaling results using Schaffer's F6 
function benchmark: 



^ ' [1 + 0.001 ■(x2+|/2)]2 ^ > 






10^10 



(a) Full search space (b) Fine structure 

Figure 11: Plot of Schaffer's F6 function 



As can be seen in Fig. [11} the difficulty in this function is that the size of the potential 
maxima that need to be overcome to get to a minimum increases the closer one gets to the 
global minimum. 

In Tab. [3| results can be found which were obtained with Holland and with the one-point 
crossover operators (obviously, with a real-number encoded genotype approach, not more 
cuts can take place for a two-dimensional function). The results are outcomes of three 
successive runs which is sufficient to obtain a general picture of the trend. 



Algorithm 


w/ locopt 


w/o locopt 


Holland 


990±325 


180284±134848 


Germany 


1901±1282 


1609516±1905293 


Portugal: 1 


620±473 


832919±436006 



Table 3: Different results for Schaffers F6 function with one-point and zero-point crossover 
operators. 

The difference between the Portugal and Germany approach in this very special case is that 
Germany in contrast to Portugal can also yield a crossover point before the first number, 
effectively reducing it to a partial non-crossing approach. 
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Interestingly, we do see converse tendencies between the case with and without local op- 
timization. We see a clear preference of the Germany crossover operator over Portugal 
without local optimization. Taking the results of the non-crossing operator into account, 
it seems clear that without local optimization too much crossing is harmful in terms of 
convergence to the global minimum. With local optimization enabled, these differences 
disappear, sometimes even allowing the global minimum to be found in the initial (and 
therefore never crossed) pool of solutions. 

Although Schaffer's function shows an impressive difficulty for a two-dimensional function, 
it should still be easily solvable with and without local optimization. 

3.6 Scatter of the benchmarking results 

Obviously any stochastic approach is difficult to benchmark in a reliable manner. There- 
fore, we would like to discuss the scatter of the benchmarking results. We will try to 
approximate possible deviations for every crossover operator used with and without local 
optimization for a two hundred dimensional Ackley's function. For this, we present in 
Tab. |4] results from ten runs per crossover operator used. 



Algorithm 


Local opt. 


Maximum 


(% dev.) 


Minimum 


(% dev.) 


Average 


Holland 


w/ 


8991 


18.5 


6113 


19.5 


7590 




w/o 


10323905 


13.0 


7805543 


14.5 


9132168 


Germany 


w/ 


11289 


57.1 


4758 


33.8 


7186 




w/o 


4312566 


28.1 


2704586 


19.6 


3365305 


Portugal:! 


w/ 


20464 


126.8 


5191 


42.5 


9023 




w/o 


4748780 


36.7 


2095005 


39.7 


3473084 


Portugal: 3 


w/ 


7172 


27.6 


4704 


16.3 


5621 




w/o 


4640199 


45.5 


2328333 


27.0 


3189287 


Portugal: 5 


w/ 


6268 


20.2 


4873 


6.6 


5215 




w/o 


4745643 


39.4 


2160887 


36.5 


3403532 


Portugal:? 


w/ 


6510 


15.3 


4858 


14.0 


5646 




w/o 


4221855 


29.9 


2541700 


21.8 


3249622 



Table 4: Number of global optimization iterations from ten successive runs on the 200D 
Ackley function. 

Of course, the results do not and cannot take all or the maximum possible deviations into 
account since in principle there should be a probability distribution from a single iteration 
up to infinity which can only be captured adequately by an infinite amount of successive 
runs. Nevertheless, ten successive runs can be considered to give a rough idea of the 
location of the maximum. 

The impression gained in the previous sections, namely that the exact nature of the geno- 
type operator does not seem to make a difference when using local optimization, holds 
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true also with enhanced statistics for this case. Similarly, the differences seen in the runs 
without local optimizations between the crossing operators and the non-crossing operator, 
Holland, remain also when averaging over more runs. 

This allows for the conclusion that the results presented in the previous sections are giving 
a reasonably accurate picture, despite of course suffering from the inherent uncertainty in 
all stochastic methods which cannot be circumvented. 

Upon closer examination of the data in Table |4| some seemingly systematic trends can be 
observed, calling for speculative explanations. A general tendency observed is the reduced 
spread when local optimization is turned off, probably because a higher diversity can be 
maintained providing a better and more reliable convergence. Another tendency is the 
reduced spread when more — or no — crossover points are used. In the case of more 
crossover points this can be explained with more crossover points causing bigger changes 
in each step; this improves search space coverage, which in turn makes the runs more 
reproducible. For the reduced scatter of the non-crossing operator, the explanation is 
obviously the opposite, since this operator minimizes changes to the genome, allowing for 
a better close-range exploration. 

Despite of these interesting observations, we refrain from further analysis since this would 
lead us outside of the scope of the present article. 



4 Gaussian benchmark class 

To our experience from the global optimization of chemical systems, real-world problems 
are considerably more challenging than the benchmark functions described above. For 
example, in the case of the relatively trivial Lennard- Jones (LJ) clusters, the best scaling 
we could reach is cubic [8J. Therefore, we feel a need for benchmarks with a difficulty more 
closely resembling real-world problems. 

Defining new benchmark functions is of course not trivial since they should fulfill certain 
criteria. 

1. Not trivial to solve. 

2. Easy to extend to higher dimensions causing higher difficulty. 

3. Possibility to define an analytical gradient for gradient based methods. 

4. Of multimodal nature with a single and known global minimum. 

To have better control over these criteria when generating benchmark functions, a few 



"search landscape generators" have been proposed in recent years 31 33 . The simplest 
and most flexible of these is the one based on randomly distributed Gaussians 31 . For 
convenience, we have used our own implementation of this concept, abbreviated GRUNGE 
(GRUNGE: Randomized, UNcorrelated Gaussian Extrema), defined and discussed in the 
following. We would like to emphasize already at this point that our intention in using 



GRUNGE is not to re-iterate known results from Ref. 31 and similar work, but to directly 
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contrast the OGOLEM behavior displayed in section [3] with its different behavior in the 
GRUNGE benchmark. This shows strikingly that the rather uniform results in section [3] 
are not a feature of OGOLEM but rather a defect of that benchmark function class. 
We define a function as a set of randomized Gaussians 



N 



i=0 



M 



'Ci ■ ^ {^j ^i) 

j=0 



(16) 



with the random numbers ^i, Q and Ki being the weight, width and position of the i- 
th Gaussian in M-dimensional space. As can be easily seen, this class of benchmarking 
problems provides (besides the search space size) two degrees of freedom. One is the number 
N of randomized Gaussians in the search space and M being the dimensionality of the 
Gaussiansj^ More subtly, there is also a connection between these two characteristics and 
the Gaussian widths within the maximal coordinate interval (i.e., the Gaussian density). 
With proper choices of these numbers, one can smoothly tune such a benchmark function 
between the two extremes of a "mountain range" (many overlapping Gaussians) and a "golf 
course" (isolated Gaussians with large flat patches in-between). 

Functions in this class of benchmarks are not easy to solve, easily extendable in dimen- 
sionality and — through the use of Gaussians — the definition of an analytical gradient 
is trivial. The only problem remaining is to pre-determine the position and depth of the 
global minimum. Here, other benchmark function generators like, e.g., the polynomial 



ones proposed by Gaviano et al. 33 and Locatelli et al. 32 , allow for more control, but at 
the price of more uniform overall features of the generated test functions, which is exactly 
what we want to avoid. We also do not want to enforce a known global minimum by 
introducing a single, dominating Gaussian with excessively large weight by hand. Thus, 
the only remaining possibility is to define a fine grid over the search space and to run local 
optimizations starting at every grid point, to obtain a complete enumeration of all min- 
ima within the search space. Due to the simple functional form and to the availability of 
an analytical gradient, this is a realistic proposition for moderately-dimensioned examples 
(lOD) containing a sufficiently great number of sufficiently wide Gaussians. 
In contrast to the traditional benchmarks examined in the previous sections, the GRUNGE 
function is not deliberately designed to be deceptive, in any number of dimensions. In- 
stead, due to the heavy use of random numbers in its definition, it does not contain any 
correlations whatsoever. To our experience, this feature makes GRUNGE benchmarks 
much harder than any of the traditional benchmarks. We cannot offer formal proofs at 
this stage, but our distinct impression from many years of global optimization experience 
is that realistic problems tend to fall in-between these two extremes, being harder than 
traditional benchmarks but less difficult (less uncorrelated) than GRUNGE. 
Obviously, a full exploration of the randomize Gaussians set of benchmark functions re- 
quires an exclusive and extensive study, which has already been started by others |3ll. As 



already mentioned above, our sole intention here is to provide a contrast to the OGOLEM 



^As a side note, we write GRUNGE(M ,N ), e.g. for 2000 gaussians in a ten dimensional space 
GRUNGE[10,2000]. 
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behavior noted in section |3j To this end, we present results based on solving a ten- 
dimensional GRUNGE benchmark with 2000 gaussians {GRUNGE[10,2000j) within a 
search space of 

0.0 <Xi< 10.0 (17) 

with local optimization enabled. 



Algorithm 


Run 1 


Run 2 


Run 3 


Average 


Holland 


5725 


7676 


6228 


6543 


Germany 


2390 


153 


5390 


2644 


Portugal: 1 


3145 


7637 


4077 


4953 


Portugal: 3 


1879 


2575 


1339 


1931 


Portugal: 5 


2441 


1647 


4888 


2992 


Portugal:? 


5533 


369 


6322 


4074 



Table 5: GRUNGE[10,2000] benchmarking results with local optimization steps. 

As can be seen from the results in Tab. [5| the average of three independent runs of all 
algorithms yields results within the same order of magnitude. When comparing the num- 
bers in Tab. |5]with the results given above for the conventional benchmark functions, e.g., 
with those in Tab. |4| one should remember the differences in dimensionality: Here we are 
dealing with a 10-dimensional problem with 2000 minima, whereas in Tab. |4]we reported 
the performance on the 200-dimensional Ackley function with the number of minima being 
several orders of magnitude higher. This gives an indication of what we experience as a 
big difference in difficulty. 

It should be noted, however, that in two cases the global optimum could be found within 
the locally optimized initial parameter sets. While this demonstrates once more that 
a randomly distributed initial parameter set can have an extraordinary fitness, it also 
indicates that higher dimensional GRUNGE benchmarks are necessary to better emulate 
real- world problems. 

We also did some tests without local optimization, showing that the GRUNGE bench- 
mark with our randomly generated Gaussians is extremely difficult to solve without local 
optimization, requiring almost 19 million global optimization steps with Portugal:3. We 
suspect that this level of difficulty is related to inherent features of the GRUNGE bench- 
mark (e.g., to the completely missing correlation between the locations and depths of the 
minima) but also to features of the specific GRUNGE[10,2000] incarnation used here (i.e., 
this particular Gaussian distribution and density) , but decide to leave this sidetrack at this 
point. 
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5 Lunacek's function 



Lunacek's function |34|, also known as the bi- or double-Rastrigin function, is a hybrid 
function consisting of a Rastrigin and a double-sphere part and is designed to model the 
double-funnel character of some difficult LJ cases, in particular LJag. 



f{xo...Xn) = min - , |(i ■ + s ■ ^(xj - /X2)^|^ +10 y^(l-cos 27r(x— /ii)) 

(18) 

This indicates that there is an interest in developing benchmark functions of higher diffi- 
culty, and indeed the developed function provides an interesting level of difficulty, as we 
show below. Nevertheless, we would like to dispute the notion that it resembles certain real- 
world problems and the source of their difficulty. Specifically, Lunacek et al. claim that the 
global optimization of homogeneous LJ clusters is one of the most important applications of 
global optimization in the field of computational chemistry. Furthermore, they claim that 
the most difficult instances of the LJ problem possess a double-funnel landscape. The for- 
mer claim is a rather biased view and promotes the importance of homogenous LJ clusters 
from a mere benchmark system to a hot-spot of current reasearch. As the broad literature 



on global cluster structure optimization documents (cf. reviews 11,35 -37 and references 
cited therein), current challenges in this field rather are directed towards additional com- 
plications in real-life applications, e.g., how to taylor search steps to dual-level challenges 
of intra- and intermolecular conformational search in clusters of flexible molecules, or how 
to reconcile the vast number of necessary function evaluations with their excessive cost at 
the ab-initio quantum chemistry level. In terms of search difficulty, the homogeneous LJ 
case is now recognized as rather easy for most cluster sizes, interspersed with a few more 
challenging problem realizations at certain sizes, with LJ38 being the smallest and hence 
the simplest of them. This connects to the second claim by Lunacek et al, namely that 
the difficulty of LJ38 arises from the double-funnel shape of its search landscape, which 
is captured by their test function design. It is indeed tempting to conclude from the dis- 
connectivity graph analysis by Doye, Miller and Wales 38 that there are two funnels, a 
narrow one containing the fee global minimum, separated by a high barrier from the broad 
but less deep one containing all the icosahedral minima. Even if this were true (to our 
knowledge, such a neat separation of the two structural types in search space has not been 
shown), it would give rise to only two funnels in a 108-dimensional search space, which 
is not necessarily an overwhelming challenge and also not quite the same as what eq. 18 
offers. 

Lunacek's function is specifically designed to poison global optimization strategies working 
with bigger population sizes. This is achieved through the double sphere contribution 
which constructs in every dimension a fake minimum, e.g., when using the settings s = 0.7, 
d = 1.0, with the optimal minimum located at Xi = 2.5. As Lunacek et al. have proven in 
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their initial publication, the function is very efficient in doing this. This is an observation 
that we can support from some tests on 30-dimensional cases. 



Dimensionality 


Static grid 


^teps to solution 


MNIC 


2 


w /o 


833 


N/A 




w / 


1 nfi3 


9^0 


5 


w/o 


2186 


N/A 




/ 


4184 


100 


10 


w/o 


392006 


N/A 




w/ 


13922 


100 


15 


w/o 


459317 


N/A 




w/ 


604954 


50 


20 


w/o 


not found 


N/A 




w/ 


1153811 


50 


30 


w/o 


not found 


N/A 




w/ 


2826707 


20 



Table 6: Exemplary benchmarking results of the Lunacek function. All results obtained 
with local optimization steps and the germany algorithm. Not found corresponds to more 
than 10 million unsuccessful global optimization steps. MNIC is the maximum number of 
individuals allowed per grid cell. 

Clearly, this is a markedly different behavior than that observed above for Ackley's, Ras- 
trigin's, Schwefel's or Schaffer's functions, coming closer to what we experienced in tough 
application cases. Therefore, it is not surprising that additional measures developed there 
are also of some help here. One possibility is to adopt a niching strategy, similar to what 
was applied to reduce the solution expense for the tough cases of homogenous LJ clusters 
to that of the simpler ones In essence, this ensures a minimum amount of diversity in 
the population, preventing premature collapse into a non-globally optimal solution. 
The most trivial implementation of niching is to employ a static grid over search space 
and to allow only a certain maximal population per grid cell (MNIC, maximum number 
of individuals per grid cell). Already this trivial change allows the previously unsolvable 
function to be solved in 30 dimensions, as can be seen from Tab. |6j Solving higher dimen- 
sionalities (e.g. lOOD) with this approach suffers again from dimensionality explosion, this 
time in the number of grid cells. Additionally, such basic implementation truncates the 
exploitation abilities of the global optimization algorithm. This causes the algorithm with 
static niches to require more steps in those low dimensional cases where the problem can 
also be solved without. From our experience with LJ clusters, we expect more advanced 
niching strategies, for example dynamic grids, to prove useful with higher dimensionalities 
and render the function even less difficult. 

As it seems to be common wisdom in this area, applications do contain some degree of 
deceptiveness and different degrees of minima correlation, as well as different landscape 
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characteristics, sampling all possibilities between golf courses and funnels. All of that 
can be captured with the GRUNGE setup. It thus offers all the necessary flexibility and 
simplicity, combined in a single function definition. The obvious downsides are the absence 
of a pre-defined global minimum (which can also be interpreted as a guarantee for avoiding 
biases towards it) and the need to tune many parameters to achieve a desired landscape 
shape. 

6 Summary and Outlook 

Scaling investigations for four different, standard benchmark functions have been presented, 
supplemented by performance tests on a fifth function. Using straightfoward GA tech- 
niques without problem-specific ingredients, the behavior we observe in all these cases is 
markedly different from what we observe upon applying the same techniques to real-world 
problems (often including system-specific additions): All benchmarks can be solved with 
sub-quadratic scaling, whereas in real- world applications we can get cubic scaling at best 
and often have to settle for much worse. In addition, the benchmarks often do not allow 
for statistically significant conclusions regarding the performance of different crossover op- 
erators, nor for a decision on whether to include local optimization or not. Thus, overall, 
benchmarks of this type do not seem to fulfill their purpose of test beds with relevance for 
practical applications in global cluster structure optimization or similar areas. 
We have contrasted the behavior on those traditional benchmark functions with that on 
two different types of functions. One type is the "landscape generator" class, shown here 
in a particularly simple realization, namely search landscapes generated by randomly dis- 
tributed Gaussians. Varying Gaussian characteristics (depth, width, density, dimension- 
ality), search space features can be tuned at will, on the full scale between a "mountain 
range" and a "golf course". In addition, since the constituting Gaussians are completely 
uncorrelated, the difficulty of this problem class is inherently larger than that of the tra- 
ditional benchmark functions where the minima characteristics follow a simple rule by 
construction. This is strikingly reflected in our tests results on this benchmark class. 
As yet another class of benchmark functions we have shown the deceptive type, designed to 
lead global optimization astray. Their difficulty can be diminished signiflcantly by making 
the global search more sophisticated, in the case of population-based searches by ensuring 
a sufficient degree of diversity in the population. Given a sufliciently flexible setup, this 
benchmark class merely is a subclass of the landscape generators. 

Further work will be required to conflrm our suspicion that real-world problems often fall 
in-between the traditional, rather simple benchmark functions on the one hand and the 
less correlated, more deceptive ones on the other hand, both with respect to their search 
space characteristics and to the difficulty they present for global optimization algorithms. 
In any case, we hope to have shown convincingly that due to their simplicity functions 
from the traditional benchmark functions should not be used on their own, neither to aid 
global optimization algorithm development nor to judge performance. 
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